Inverse Reinforcement Learning through Structured Classification

نویسندگان

  • Edouard Klein
  • Matthieu Geist
  • Bilal Piot
  • Olivier Pietquin
چکیده

This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclass classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Structured Prediction to Inverse Reinforcement Learning

Machine learning is all about making predictions; language is full of complex rich structure. Structured prediction marries these two. However, structured prediction isn’t always enough: sometimes the world throws even more complex data at us, and we need reinforcement learning techniques. This tutorial is all about the how and the why of structured prediction and inverse reinforcement learning...

متن کامل

Structured Classification for Inverse Reinforcement Learning

This paper addresses the Inverse Reinforcement Learning (IRL) problem which is a particular case of learning from demonstrations. The IRL framework assumes that an expert, demonstrating a task, is acting optimally with respect to an unknown reward function to be discovered. Unlike most of existing IRL algorithms, the proposed approach doesn’t require any of the following: complete trajectories ...

متن کامل

On Correcting Inputs: Inverse Optimization for Online Structured Prediction

Algorithm designers typically assume that the input data is correct, and then proceed to find “optimal” or “sub-optimal” solutions using this input data. However this assumption of correct data does not always hold in practice, especially in the context of online learning systems where the objective is to learn appropriate feature weights given some training samples. Such scenarios necessitate ...

متن کامل

Around Inverse Reinforcement Learning and Score-based Classification

Inverse reinforcement learning (IRL) aims at estimating an unknown reward function optimized by some expert agent from interactions between this expert and the system to be controlled. One of its major application fields is imitation learning, where the goal is to imitate the expert, possibly in situations not encountered before. A classic and simple way to handle this problem is to see it as a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012